-
-
Notifications
You must be signed in to change notification settings - Fork 33k
GH-132657: Add lock-free set contains implementation #132290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This makes for longer code vs using the custom LOAD_*/STORE_* macros. However, I think this makes the code more clear.
🤖 New build scheduled with the buildbot fleet by @nascheme for commit 70a1c1f 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132290%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Misc/NEWS.d/next/Core_and_Builtins/2025-07-11-19-57-27.gh-issue-132657.vwDuO2.rst
Outdated
Show resolved
Hide resolved
Hello @eendebakpt, since you have recently worked on the setobject.c source, perhaps you are able to take another look at this. I think the free-threaded scaling improvement is significant. I've merge with the main branch and updated the benchmark results. |
NUM_LOOPS = 20 | ||
|
||
s = frozenset() | ||
def make_set(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the make_set
here in a thread? It just assigns the same frozenset to s
each time.
Maybe change make_set
so that it creates a different frozenset
each time?
#define SET_MARK_SHARED(so) _PyObject_GC_SET_SHARED(so) | ||
|
||
static void | ||
ensure_shared_on_read(PySetObject *so) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this is identical to the ensure_shared_on_read
from dictobject.c
. The SET_DICT_SHARED
would correspond to SET_SET_SHARED
, which is an odd name so here it is SET_MARK_SHARED
.
FT_ATOMIC_STORE_SSIZE_RELAXED(a->hash, -1); | ||
FT_ATOMIC_STORE_SSIZE_RELAXED(b->hash, -1); | ||
} | ||
if (!SET_IS_SHARED(b) && SET_IS_SHARED(a)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_swap_bodies
is only called in two locations in this file. For both cases the second argument b
is a newly created temporary which is discarded afterwards. So we could replace part (all?) of these branches with asserts. Part of the other work (e.g. copying back to b
) might not be needed for the same reason.
Py_BEGIN_CRITICAL_SECTION(so); | ||
rv = set_contains_lock_held(so, key); | ||
Py_END_CRITICAL_SECTION(); | ||
return set_contains_entry(so, key, hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return set_contains_entry(so, key, hash); |
{ | ||
int status; | ||
setentry *entry; | ||
if (PyFrozenSet_CheckExact(so)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this path can be taken. set_lookkey_threadsafe
is only called in set_contains_entry. The set_contains_entry
is only called in _PySet_Contains
. _PySet_Contains
is called in set___contains___impl
and in bytecodes.c _CONTAINS_OP_SET
(the latter can have frozenset as input)
This roughly follows what was done for dictobject to make a lock-free lookup operation. The set contains operation scales much better when used from multiple-threads. The frozenset contains performance seems unchanged (as already lock-free).
Summary of changes:
set_lookkey()
intoset_do_lookup()
which now takes a function pointer that does the entry comparison. This is similar to dictobject anddo_lookup()
. In an optimized build, the comparison function is inlined and there should be no performance cost to this.set_do_lookup
to return a status separately from the entry value.set_compare_frozenset()
and use if the object is a frozenset. For the free-threaded build, this avoids some overhead (locking, atomic operations, incref/decref on key)FT_ATOMIC_*
macros as needed for atomic loads and storesmemcpy()
.so->table
to NULL while the change is a happening. Assign the real table array after the change is done.Free-threading scaling benchmark results from the attached scripts (result for 8 cores in parallel). This is a modified version of the
ftscalingbenchmark.py
script.ftscaling_set.py.txt